Domain Overlap for Iterative Sparse Triangular Solves on GPUs
نویسندگان
چکیده
Iterative methods for solving sparse triangular systems are an attractive alternative to exact forward and backward substitution if an approximation of the solution is acceptable. On modern hardware, performance benefits are available as iterative methods allow for better parallelization. In this paper, we investigate how block-iterative triangular solves can benefit from using overlap. Because the matrices are triangular, we use “directed” overlap, depending on whether the matrix is upper or lower triangular. We enhance a GPU implementation of the blockasynchronous Jacobi method with directed overlap. For GPUs and other cases where the problem must be overdecomposed, i.e., more subdomains and threads than cores, there is a preference in processing or scheduling the subdomains in a specific order, following the dependencies specified by the sparse triangular matrix. For sparse triangular factors from incomplete factorizations, we demonstrate that moderate directed overlap with subdomain scheduling can improve convergence and timeto-solution.
منابع مشابه
Iterative Sparse Triangular Solves for Preconditioning
Sparse triangular solvers are typically parallelized using levelscheduling techniques, but parallel efficiency is poor on high-throughput architectures like GPUs. We propose using an iterative approach for solving sparse triangular systems when an approximation is suitable. This approach will not work for all problems, but can be successful for sparse triangular matrices arising from incomplete...
متن کاملDevelopment of Krylov and AMG Linear Solvers for Large-Scale Sparse Matrices on GPUs
This research introduce our work on developing Krylov subspace and AMG solvers on NVIDIA GPUs. As SpMV is a crucial part for these iterative methods, SpMV algorithms for single GPU and multiple GPUs are implemented. A HEC matrix format and a communication mechanism are established. And also, a set of specific algorithms for solving preconditioned systems in parallel environments are designed, i...
متن کاملParallel Solution of Sparse Triangular Linear Systems in the Preconditioned Iterative Methods on the GPU
A novel algorithm for solving in parallel a sparse triangular linear system on a graphical processing unit is proposed. It implements the solution of the triangular system in two phases. First, the analysis phase builds a dependency graph based on the matrix sparsity pattern and groups the independent rows into levels. Second, the solve phase obtains the full solution by iterating sequentially ...
متن کاملOn Level Scheduling for Incomplete LU Factorization Preconditioners on Accelerators
The application of the finite element method for the numerical solution of partial differential equations naturally leads tolarge systems of linear equations represented by a sparse system matrix A and right hand side b. These systems are commonly solved using iterative solvers, particularly Krylov subspace methods, which are typically accelerated using preconditioners to obtain good convergenc...
متن کاملInvestigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...
متن کامل